To be successful, cybercriminals must figure out how to scale their scams. They duplicate content on new websites,\noften staying one step ahead of defenders that shut down past schemes. For some scams, such as phishing and\ncounterfeit goods shops, the duplicated content remains nearly identical. In others, such as advanced-fee fraud and\nonline Ponzi schemes, the criminal must alter content so that it appears different in order to evade detection by\nvictims and law enforcement. Nevertheless, similarities often remain, in terms of the website structure or content,\nsince making truly unique copies does not scale well. In this paper, we present a novel optimized combined clustering\nmethod that links together replicated scam websites, even when the criminal has taken steps to hide connections. We\npresent automated methods to extract key website features, including rendered text, HTML structure, file structure,\nand screenshots. We describe a process to automatically identify the best combination of such attributes to most\naccurately cluster similar websites together. To demonstrate the method�s applicability to cybercrime, we evaluate its\nperformance against two collected datasets of scam websites: fake escrow services and high-yield investment\nprograms (HYIPs). We show that our method more accurately groups similar websites together than those existing\ngeneral-purpose consensus clustering methods.
Loading....